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The accuracies rates of the neural networks mainly depend on the selection 
of the correct data centers. The K-means algorithm is a widely used 
clustering algorithm in various disciplines for centers selection. However, the 
method is known for its sensitivity to initial centers selection. It suffers not 
only from a high dependency on the algorithm's initial centers selection but, 
also from data points. The performance of K-means has been enhanced from 
different perspectives, including centroid initialization problem over the 
years. Unfortunately, the solution does not provide a good trade-off between 
quality and efficiency of the centers produces by the algorithm. To solve this 
problem, a new method to find the initial centers and improve the sensitivity 
to the initial centers of K-means algorithm is proposed. This paper presented 
a training algorithm for the radial basis function network (RBFN) using 
improved K-means (KM) algorithm, which is the modified version of KM 
algorithm based on distance-weighted adjustment for each centers, known as 
distance-weighted K-means (DWKM) algorithm. The proposed training 
algorithm, which uses DWKM algorithm select centers for training RBFN 
obtained better accuracy in predictions and reduced network architecture 
compared to the standard RBFN. The proposed training algorithm was 
implemented in MATLAB environment; hence, the new network was 
undergoing a hybrid learning process. The network called DWKM-RBFN 
was tested against the standard RBFN in predictions. The experimental 
models were tested on four literatures nonlinear function and four real-world 
application problems, particularly in Air pollutant problem, Biochemical 
Oxygen Demand (BOD) problem, Phytoplankton problem, and forex pair 
EURUSD. The results are compared to proposed method for root mean 
square error (RMSE) in radial basis function network (RBFN). The proposed 
method yielded a promising result with an average improvement percentage 
more than 50 percent in RMSE. 
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1. INTRODUCTION 


Radial Basis Function networks (RBFN) form a type of Artificial Neural Networks (ANNs), which 
has certain advantages over other kinds of ANNs, with better approximation capabilities, simpler network 
structures and faster learning algorithms. As a result, RBF networks turn popular among researchers. Many 
researchers whom have been working to produce more effective training algorithms, set alongside the 
standard techniques [1-7]. RBF networks are useful in approximation problems, but it requires a long time to 
teach the networks as it pertains to a huge number of training data, yet create a high error because of possible 
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invalid data or outlier in the training data. Though a combination of clustering methods in RBF networks has 
been proved by Sarimveis [1] to be faster in training, it still produces a more substantial error. This is due to 
the standard clustering algorithms, which still lack the ability to choose the most accurate and informative 
centers. By using distance-weighted K-means (DWKM) algorithm, we can fix the problem stated above. 
Noted that the more accurate the centers chosen, the more accurate the information that feeds to the train 
network, this leads to more accurate results. In this paper, a fast algorithm for training RBF networks which 
yield high accuracies is presented, where the input centers are selected through the DWKM algorithm. The 
methodology is illustrated through the application of eight experimental models, with four from literatures 
function, and four real-world problems data obtained from Lim [8]. The advantages of the presented learning 
strategy, DWKM-RBEN, are identified and the results are compared with standard RBFN. 


2. RELATED WORKS 

One of the best features of neural networks is its ability to generalize and approximate a sample data 
without the need of specify equation and coefficients, particularly when an unknown model describing an 
unknown complex relation and training data abundant. Due to their ability to generalize substantially, Radial 
Basis Function networks (RBFN) are usually selected for this purpose [9-21]. Furthermore, in this big data 
era, many domains such as image processing, text categorization, biometric, microarray, etc. had the size of 
datasets so large, that real-time system requires long time and memory storage to process them. Under such 
conditions, approximation task using available datasets can become a challenging task and difticur. This 
problem is more challenging in distance based learning algorithms such as RBFN [14, 22, 23], k-nearest 
neighbor [24-25], clustering method [21, 26-29] and support vector machine [30-32]. By default, the NN 
algorithm must search through all available training samples which requires large memory, and performs 
distance to center calculation, is slow during training of NN for approximation purposes. Additionally, due to 
NN stores all samples distances for training datasets, thus, noise distances are stored as well, which can cause 
degrade in approximation accuracy. Recently, Mirjalili [33] demonstrated that the hybrid of evolutionary 
algorithm such as particle swarm optimization (PSO) with RBFN shows a good performance in classification 
roblems and approximation problems. The used of evolutionary algorithm as a tool to select more accurate 
centers are also reported in many recent literatures [7, 33-38] for RBFN training indeed is a good method if 
the networks training speed and computation cost are not main concerns. 

Along the increases of RBFN popularity, the application of RBFN in areas such as 
classification [10-14], pattern recognition [4, 16, 39] and prediction [15, 17-20, 40-43] increases, thus proof 
the wide uses and reliability of RBFN in many fields. However, RBFN accuracy mainly depending on the 
initial centers selected from dataset before network training begins [15, 39, 41, 44, 45]. Besides, the size of 
training datasets and invalid data found in datasets also play an important role in determining networks 
training speed and accuracy [46-50]. Furthermore, it is also reported that the learning algorithm for networks 
training may perform worse with the increases of dataset [51]. Hence, to solves these mentioned problems, 
researchers proposed the uses of clustering algorithms in centers selection [1, 28, 29, 44, 52-58] for RBFN 
for obtaining better accuracy and avoid possible invalid datasets includes into networks training. The most 
widely use clustering algorithm in centers selection is K-means algorithm, as it is the fastest, less complex 
algorithm, and yield good accuracy for centers selection in compared to other available clustering algorithm. 
However, at present available conventional clustering algorithm are not flawless algorithms that guarantee 
high accuracy in choosing centers. It is reported that most of conventional clustering algorithms are weak in 
extracting centers from time-series datasets [59]. Furthermore, in this work, there exist two real-world time- 
series datasets, the air-pollutant datasets and the forex EURUSD pair datasets. Thus, researchers further find 
ways to improve the weakness of existing clustering algorithms. 

In recent literatures, Ismkhan [52] proposed an improved K-means algorithm with iterative multi- 
clustering approaches for selecting centers while minimizing the objective function. The method performed 
first level clustering and stored the first-generation centers, then further run the second level clustering to 
shorten the centers distances. This method yield good accuracy but heavy in computation cost. Similar 
approaches is observed in Yu et al. [53] works. Yu team uses a tri-level K-means for determining the centers. 
This algorithm can overcome noise and filter invalid data, but costly in computation load. The work in 
improving K-means algorithm was further carry-on using density concepts. Zhang et al. [28] introduced the 
uses of density canopy to improved K-means algorithm and yield good accuracy. The method also turn 
K-means algorithm become less sensitive to noise data. This method mainly focuses on calculating density of 
each object or data points in dataset and selected only the centers with highest density points nearby it. 
Similar works was found in literatures from early years [56, 60-62], using density concepts that focus on 
computing density of data points around the centers. 

In literature by Kant and Ansari [63], introduces an initial centers selection method for K-means 
algorithm using Atkinson indexes. Instead of randomly selecting the initial centers for proceeding K-means 
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algorithm for further adjustment of centers, the authors choose to uses this approaches reduce the possible 
error of invalid data from randomize selection approaches. Atkinson indexes uses boundary and inequality to 
determine the range of centers selection for K-means algorithm. In early works by Ding et al. [64], uses 
similar concepts in their proposed Yinyang K-means algorithm. This algorithm uses upper and lower bounds 
between a point and random initial centers for selecting suitable centers. This algorithm applied a 
complicated process to perform the computation; it is not easily understandable for regular researcher. In 
addition, there were also researchers applied kernel functions for distance calculation in K-means algorithm 
to obtained better classification results [58]. By using kernel function, the complicated dataset are mapped 
into higher dimension which turning the dataset into more easily separable and classify datasets. Tzortzis and 
Likas [65] introduced a method to assigned weights to clusters relative to their variance. This works shed a 
new insight on using statistical approaches in machine learning algorithm, however it is too complicated and 
costly in computation. Melnykov and Melnykov [66] proposed the uses of Mahalanobis distance to replace 
the Euclidean distance in K-means algorithm. The approaches yield not significant 
improvement in accuracies. 

Above literatures not focus on distance-based ratio or weightage for selecting and determine the 
accurate position centers. However, some literatures that focus on density of each point provided insight for 
our work to progress in using distance-based weight for selecting better centers for RBFN training. Using just 
distance-ratio as weight, the centers selection algorithm in K-means can precede faster computation without 
have to gone through complicated algorithm and costly computation. This paper is organized with the 
following section describing the standard K-means algorithm, follows by improvement done for K-means 
algorithm, and finally the proposed training method used for simulating and predicting the eight experimental 
models. Then, in section 4 discussed the results of each models and compared with standard RBFN for 
accuracies. Finally, section 5 concludes the findings and discussed some future work that would help 
in improving the proposed training method. 


3. METHODOLOGY 
3.1. Standard K-means algorithm 

K-means (KM) algorithm [62] is one of the simplest unsupervised learning algorithms that solve the 
well-known clustering problem. It is an algorithm based on finding data clusters in a data set such that a cost 
function of dissimilarity (distance) measure is minimized. The procedure follows a simple and easy to 
classify with a given data set through some clusters fixed a priori. The main idea is to define K centres, one 
for each cluster. In other word; K-means algorithm is an algorithm to classify or group objects based on 
attributes or features into K number of groups. K is a positive integer number. The grouping is done by 
minimizing the sum squares of distances between data and the corresponding cluster centre [29, 56]. Thus, 
the purpose of K-means algorithm is to classify the data. 

The basic step of K-means algorithm is simple: 

Iterate until stable (no object move group) 
a) Determine the centre coordinate. 
b) Determine the distance of each object to the centre. 
c) Group the object based on minimum distance. 


To describe the algorithm, we need some notations. A set of n vectors xjE R od = 1,...N, are to be 


partitioned into c groups Gj, i = 1,...,c. The cost function, based on the Euclidean distance between a vector 
x, in group j and the corresponding cluster centre c;, is defined by: 


1-S4-3( 5 Isat] (1) 


i=l \ kx,€G, 


where J, = » lx, —¢, I is the cost function within group 7. 


k,x, G, 
The partitioned groups are defined by a c x n binary membership matrix U, where the element uj; is 


1, if the jth data point x; belongs to groups i and 0 otherwise. Once the cluster centres c;, are fixed, the 
minimizing uw, for (1) is derived as follows: 
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0, else 
which means that x; belongs to group j if c; is the closest centre among all centres. 

On the other hand, if the membership matrix is fixed, i.e. if uj is fixed, then the optimal centre c; that 
minimizes (1) is the mean of all vectors in group 7: 


je ye (3) 


where Ic | is the size of G;, or Ic = ys 


u.. 
jal 


The algorithm is presented below: 
a) Initial the cluster centre c;, i= 1,...,c. This is done by randomly selecting c point among all data points. 
b) Determine membership matrix U by (2). 
c) Compute the cost function according to (1). Stop if either it is below a certain tolerance value or its 
improvement over previous iteration is below a certain threshold. Update cluster centre according to (3). 
d) Repeat step 2 and 3 until convergence is achieved, until there are no more object move groups, 
then stop. 
Since we are not talking about the location of the centre, we need to adjust the centre location based 
on the current updated data. Then we assign all the data to this new centre. This process is repeated until 
there is no more data moving to another cluster. 


3.2. Distance Weighted K-Means (DWKM) Algorithm 
Typically, K-means algorithm chose its centers uniformly and randomly from vector R, . To reduce 


K-means algorithm weakness during centers selection, an improvement is made at Step | in Section 2.1. 
We proposed an improved centers selection method for K-means algorithm using distance as weightage. 
The improved K-means algorithm is presented as follows: 


la. Select a center c, , uniformly chosen by random from vector R, . 


1b. Assign a new center c,, chosen from vector R, with probability of 


2 
x -¢ : : 
max Wel , 1 =1,2,3,...,C and N is the total number of dataset in R,,. 


dle - lh 
kel 


1c. Repeat Step 1b. until all the & centers is obtained. 
Then the remaining steps of K-means, from Step 2 to Step 4, proceed as the standard K-means algorithm. 


3.3. Distance Weighted K-Means (DWKM) Algorithm 

The RBFN was considered a three layer network. The input nodes pass the input values to the 
connection arcs. The internal units form a single layer of L-RBF nodes, where the Gaussian function was 
used in this layer that localized response functions in the input space. The hidden node responses are 
weighted and the output nodes are simple summations of the weighted responses. The formulation of the 
training algorithm involves a set of input-output pairs [x(i), y(i)], i = 1,...,K, where x(i) is the N-dimensional 
input vector, y(i) is the corresponding target or desired M/-dimensional output vector and K is the number of 
training examples. The set of input-output examples is the information base use to determine the values of the 
unknown parameters, i.e. the hidden node centers and radii and the connection weights between the hidden 
and the output layer. The remaining of the RBFN procedure are calculated using standard methods. 


4. RESULTS AND DISCUSSION 

The proposed DWKM-RBEN was tested using 4 nonlinear function from literatures, which are 
Santner et al. [67] function given in (4), Lim et al. [68] function in (5), Dette and Pepelyshev [69] 
function in (6), and Friedman [70] function in (7). For all these 4 functions, the training set for RBFN 
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consists of 400 sets of random generated data points and test set comprises 400 sets of random generated data 
points, both in range of [0,1]. 


f(x) = exp(-1.4x) cos(3.57x), x €[ 0,1] (4) 
f (x)= -[(3045s sin (5x, ))(4+exp(-5x,))-100], x, €[0,1], Wi=1,2. (5) 


f (x) =4(x, -248x, -8x2) +(3-4x,) +16,)/x, +1(2x,-1)', x €[0,1], Wi=1,2,3. © 


f (x) =10sin (ax x,)+20(x, -0.5) +10x, +5x,, x €[0,1], Wi=1,2,3,4,5. (7) 
2; 3 4 5 i 


DWKM- RBFN was also tested on 4 real-world datasets for its performance. The Biochemical 
Oxygen Demand (BOD) concentration dataset, phytoplankton growth rate and death rate dataset and Texas 
air pollutant dataset were obtained from Aik and Zainuddin [71]. Another is the dataset from forex for 
EURUSD pairs is collected from XM Metatrader 4 database [72]. The BOD dataset and phytoplankton 
dataset consists of 100 sets of data and the test set comprises 100 sets of data. Meanwhile, for air pollutant 
dataset, the training set has 480 sets of data and the test set has 72 sets of data, which both were taken from 
hourly air data. For EURUSD pairs, the training set consists of 519 sets of data taken from year 2016 to end 
of year 2017. The test set consists of 155 sets of data taken from early year 2018 to August 2018. The 
experiment was implemented by using the newrb function because it represents the general form of an RBF 
network. Furthermore, the proposed clustering method has been implemented by using MATLAB’s function. 
Gaussian basis function has been used for both networks with other parameters such as spread was set to 
default value, so the performance of the proposed network was evaluated effectively [8]. Performance of 
standard RBFN and DWKM-RBEN in this experiment has been measured by comparing the computation 
time taken for training with number of iteration taken for convergence and the Root Mean Squared Error 
(RMSE) to measure how well both networks approximates the chosen functions and it is given by 


where 7 is the number of predicted responder; O. is the target value for time-step i, and P is the predicted 


value of the model at time-step 7. 

The number of centers is fixed to 10 centers for all 8 datasets training. For air pollutant problem, the 
pollutant monitored includes carbon monoxide, nitric oxide, nitrogen dioxide, ozone, and oxides of nitrogen. 
For experimental purposes, hourly updated air quality data obtained from Aik and Zainuddin [71] has been 
used to predict the trend of interested pollutants for Nitric Oxide, Nitrogen Dioxide and Oxides of Nitrogen. 
While for Phytoplankton problem, growth rate and death rate have been used as the interested values. As for 
the BOD problem, the BOD concentration has been taken as the interested value. Finally, the forex EURUSD 
dataset consists of three variables is taken considered for training are the daily highest price, daily lowest 
price and open price, while the close price is used for prediction. 


Table 1. Performance of DWKM-RBEN and Standard RBFN prediction results for datasets. 


sided Standard RBEN DWKM-RBFN- 
Average of RMSE Standard Deviation of RMSE Average of RMSE Standard Deviation of RMSE 

Santner 0.160980 0.132510 0.032115 0.015732 
Lim 0.151190 0.053404 0.081227 0.006912 
Dette 1.938260 1.318204 0.707767 0.157561 
Friedman 1.333590 0.700492 0.066972 0.011266 
BOD 0.000252 3.87e-07 0.000245 4.01e-06 
Phytoplankton 0.004980 0.000933 0.003884 0.000858 
Air Pollutant 4.203630 2.860265 0.137055 0.004483 
EURUSD 0.031649 0.004812 0.026758 0.005649 
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Table 2. Percentage of Improvement for DWKM-RBEN over Standard RBFN by RMSE. 


Dataset Percentage of Improvement (%) 
Santner 80.05 
Lim 46.27 
Dette 63.48 
Friedman 94.98 
BOD 2.72 
Phytoplankton 22.01 
Air Pollutant 96.74 
EURUSD 15.45 


Results from Table 1 shows that DWKM-RBFN networks outperform standard RBFN in average 
RMSE and standard deviation of RMSE. All results of average RMSE in Table 1 are calculated using ten 
times run of each networks. From Table 1, DWKM-RBEN network surpasses the standard RBF in accuracy 
and network architecture by using training set which consists only 81.8%, 71.8%, 68.5%, and 65.8% of total 
dataset size for Santner dataset, Lim dataset, Dette dataset, and Friedman dataset, respectively. While 
DWKM-RBEN network training for real-world dataset involves the BOD dataset, Phytoplankton dataset, Air 
pollutant dataset and forex EURUSD dataset used only 81.8%, 71.8%, 68.5% and 66.8%, respectively. This 
means that, it is possible to suitable number of dataset such that, it will provide a network with reduced 
complexity, faster training time and improved accuracy. 

Table 2 shows the results of the percentage of improvement of RMSE for DWKM-RBEFN network 
in compared to standard RBFN. Results showed that DWKM-RBFN network outperform standard RBFN 
network in term of accuracy more than 80% for Santner dataset, Friedman dataset and air pollutant dataset. 
Meanwhile, for Lim dataset, Dette dataset, Phytoplankton dataset and forex EURUSD dataset, each obtained 
improvement in range of 15% to 64%. However, BOD dataset shows no significant improvement for 
DWKM-RBEN network over standard RBFN with percentage less than 3%. This is due to the high 
nonlinearity of BOD data, besides, the lack of additional input variable that control the changes in BOD can 
lead to weakprediction results. 

The results from real-world datasets for Phytoplankton dataset and forex EURUSD dataset showed 
improvement of 22.01% and 15.45%, respectively. Both of this datasets are highly nonlinear data. Possible 
existence of other environmental factor not included in dataset is the reason behind the low improvement 
percentage occurs for Phytoplankton dataset. Additionally, for forex EURUSD dataset, the low improvement 
percentage is due to many possible factors not included in the dataset that also drives the movement of this 
currency values. Factor that cannot be quantified, such as political influences or natural disaster, can affect 
the currency fluctuation. Hence, if such factors can quantify and includes in the dataset, improvement in 
percentage is expected. From Table 1 and Table 2, we observed that DWKM-RBFN network provide such 
consistent results even that it uses less dataset and still able to perform such satisfying results. The results of 
prediction from DWKM-RBEN network are consistent as we observed from Table 1, the standard deviation 
of RMSE was much lower than standard RBFN. Figure 1 shows the error bar plot that displayed the 
comparison of errors for both the networks. Clearly, the standard RBFN has larger error in compared to 
DWKM-RBEN network for all datasets. 

The DWKM-RBEN network and Standard RBFN performed well in the experiments for dataset 
with smaller range that lies in [0,1]. However, for larger range of dataset such as air pollutant dataset, the 
standard RBFN performed poorly in accuracy. Besides, the DWKM-RBEN network is superior in accuracies 
and the architecture of the network but a proper adjustment in the number of centers for each dataset can 
enhance the networks accuracy to much higher level. The results in Table 1 shows that even with less dataset 
used for training, if the right centers are chosen, can impact in prediction accuracy. The uses of huge number 
of dataset does not always guarantee the desirable prediction accuracies, but sufficient number of dataset that 
can represent the shape of the distribution for the dataset is enough for providing good prediction results. 
Furthermore, many of training dataset might contain invalid data that could jeopardize the desired accuracy, 
not mentioning the size of network it would create and the time taken for training. 

Hence, there is no denial on the ability of the DWKM-RBFN network and standard RBFN in 
prediction, but it comes with a hefty compensation for the accuracy if the proper value of number of centers 
is not selected. As the number of datasets for the network becomes lesser and it results much simpler network 
architecture and possibly free of invalid dataset. Although both models provide satisfying results, the network 
structure and accuracy of the DWKM-RBFN network is superior compared to the standard RBFN network. 
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Figure 1. Error bar plot for datasets using standard RBFN and DWKM-RBEN. 


5. CONCLUSION 

Four literatures nonlinear functions and four real-world problems have been simulated in this paper, 
where we applied for real-world problems on prediction of BOD problem, phytoplankton problem, air 
pollution problem and forex EURUSD price prediction problem. The performance of both networks has been 
compared to the case using the Root Mean Squared Error (RMSE) and standard deviation as the criteria for 
performance measurement and network prediction consistency. Results from all eight studies show that the 
DWKM-RBEN network is better than the standard RBFN in prediction accuracy and network architecture. 
Thus, it is possible to improve the accuracy of the proposed network by using statistical methods to choose 
the best value of number of center to be used for different type of dataset. As conclusion, the proposed 
network is far superior to the standard RBFN network as for network architecture and accuracy. Since self- 
organized selection of centers can performed by clustering algorithms for selecting significant centers, 
sufficient to represent the distribution of dataset for the hidden nodes has been used, it would be interesting if 
the networks to be tested with high noise training data to verify the efficiency of the chosen clustering 
algorithm in center selection. 
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